Type of property - not equitable to compare apple to oranges.
Density distribution of prices varies across property type.
Insights derived from Take Home Assignment 1
Here are some findings w.r.t PSFand Transaction Prices across all the Property Types:
Finding1: Both the Interquartile Range (IQR) and density distribution is the narrowest for Executive Condominiums, closely followed by Apartments, Condominiums, Semi-Detached, and Terrace Houses. It is the widest for Detached Houses.
What this means is that: The spread or variability of the PSF is the least for Executive Condominium, followed by Apartments, Condominiums, Semi-Detached, and Terrace Houses. In other words, the prices for these property types tend to cluster more closely around the median, as indicated by their smaller Interquartile Ranges (IQRs).On the other hand, Detached Houses have the largest IQR, indicating that the prices for Detached Houses are more spread out across a wider range. This means that there can be significant variability in prices for Detached Houses, with some properties priced much higher or lower than the median.
Finding2: The Median is the lowest for Detached Houses, followed by Semi-Detached Houses, Executive Condominiums, and Terrace Houses. Condominiums and Apartments have the highest median PSF. However, upon comparing this with the distribution plot in Chart II, it becomes apparent that Property Types such as Apartments, Condominiums, and Executive Condominiums also generally have lower median Transacted Pricesdespite having higher PSF.
What this means is that: Comparing PSF between dissimilar property types, such as a condo and a landed property, may not provide an accurate sense of value and affordability. When comparing properties for value, PSF is best used between two similar property type or better, within the same project or block.(Chong, 2020)
2.2 Critique
2.2.1 - Clarity
No
Area of Concern
Potential confusion
1
Misaligned title of 2 separate charts of different font sizes
Plot II on Transacted Price seems like a subplot of Plot I on Unit Price ($PSF) when the 2 plots should be equal comparable to convey the message of non-linear correlation.
2
Plot labels of “I” and “II” is not obvious with missing plot boundaries
With plot II’s label intersecting with plot I’s grid, it seems that both “I” and “II” are annotations on plot “I”, but does not contribute much meaningful insights as redundant labels.
3
Misaligned y axis makes it difficult to compare apple-to-apple for various Property Type.
With different plot size and un-level y-axis for Plot I and Plot II, viewers are subjected to transposition error when attempting to compare across Property Type.
4
Small dot plot size makes visualising of raw data bins less meaningful as the position of dots seems to contribute to a similar linear line for “Apartment, Condo, Terrace”.
The purpose of rain cloud plots is to visualise raw data with dot plot, distribution of data with half-violin plot and key summary statistics with box-plot.
However, key statistic annotations were not included on box plot, and the dot plot size is too tiny to see distribution of underlying data.
5
Plot II’s x-axis on Transacted Price ($) does not have values displayed.
Lack of labels on Transacted Price ($) x-axis for Plot II makes it difficult for viewers to understand the magnitude of the price distribution.
Warning
Comparing Unit Price ($PSF) and Transacted Price ($) directly can be potentially misleading with unit size (Area (SQFT)) being another key influencing factor. For instance, while some detached houses have the lowest Unit ($PSF) it is not the most affordable due to its large unit sizes making total price much higher and less affordable.
2.2.2 - Aesthetic
No
Area of Interest
Potential turn-off
1
Different font size for Plot I and Plot II charts
Sub-title is a tongue twister and is over-generalised.
Smaller title size of Plot II may lead to viewers de-priortizing it’s importance.
Readability of subtitle can be improved such as including key statistics to capture reader’s attention and annotations to support the narrative of the disproportionate relationship between transacted price and unit price across property types.
2
Plot I’s color scheme is lacking contrast with grey plot with grey background
With grey plot on grey background, this gives the impression of a plain chart and does not capture viewer’s attention. Moreover, the characteristics of property type are not captured within the chart (e.g., High rise vs landed)
3
Plot II’s ridges are opaque (dark purple) and overlapping clusters at the left for most Property Type
Viewers are unable to identify the height of the sales distribution and with the density plot being based on each Property Type, the relative units sold is not included within the plot. (e.g., Apartment & Condo having much more sale than Terrace, but height is similar)
4
Plot I lack of grid lines to reference Unit Price ($PSF).
While grid lines are enforced in Plot I’s underlying R code, the contrast is not obvious, especially for reference to Unit Price ($PSF) values that is of interest to viewers.
2.3 Key Areas for Improvement with Proposed Sketch
Original Visualisation
Recommendation
Reasoning
X: Unit Price ($PSF)
Y: Property Type
Flipping axis to X: Property Type and Y: Unit Price ($PSF).
Alignment of 2 plots to similar size for comparison.
As property type is a categorical variable, placing it on x-axis enables easier comparison where viewers can scan across property types easily.
More importantly, “Unit Price ($PSF) is a continuous variable and y-axis enables easier scaling when combining 2 charts together.
Order of property type based on Alphabetical order from low to high
Re-ordering property type axis variables for easier reference to comparables based on property type characteristics and arrange by median Unit Price ($PSF).
Introducing colors to group property of similar nature and making the chart more visually appealing.
This reordering helps viewers easily compare asset classes of similar nature, and reordering by median price makes it visually appealing to see the increase over asset class.
Grid lines and color of background is not contrasting for viewers.
Include soft grid lines at Unit Price ($PSF) intervals to guide users when reading the Y-axis and enforce “0” line.
As the plot spreads across 6 property type, it is difficult to quickly identify the “Unit Price ($PSF) with viewer’s eyeballing. Grid lines will help guide more accurate.
3 Revamping
3.1 Preparation
Loading relevant library and raw data file with read_csv.
Pre-processing of data to narrow scope of visualisation to Individual Purchases (No of Units Sold = 1) and Tenure of 99 years.
Code
# Loading packagespacman::p_load(tidyverse,ggrepel, patchwork, hrbrthemes, forcats,ggdist, ggridges, ggthemes,lubridate, knitr,ggiraph, plotly, gganimate, cowplot, grid, gridExtra)# Import datarealis_1 <-read_csv("data/ResidentialTransaction20240308160536.csv")realis_2 <-read_csv("data/ResidentialTransaction20240308160736.csv")realis_3 <-read_csv("data/ResidentialTransaction20240308161009.csv")realis_4 <-read_csv("data/ResidentialTransaction20240308161109.csv")realis_5 <-read_csv("data/ResidentialTransaction20240414220633.csv")combined_data <-rbind(realis_1, realis_2, realis_3, realis_4, realis_5)# Following the same data processing steps## Dropping records where Area (SQM) is not nullcombined_data <- combined_data %>%filter(!is.na(`Area (SQM)`))## Converting sale date to date formatcombined_data <- combined_data %>%mutate(`Sale Date`=as.Date(`Sale Date`, format ="%d %b %Y"))## Removing duplicates - There are no dupicates. duplicates <- combined_data[duplicated(combined_data$Address), ]filtered_data <- combined_data[combined_data$Address %in% duplicates$Address, ] ## Including conditions to focus visualisation on individual sales and Tenure of 99 years. # Define the new labels for each categorycombined_data = combined_data %>%mutate(`Property Type`=recode(`Property Type`, "Apartment"="Apartment","Condominium"="Condo","Executive Condominium"="Executive Condos","Terrace House"="Terrace","Semi-Detached House"="Semi- Detached","Detached House"="Detached"))# Filter Conditionsresale_data <- combined_data %>%filter(`Number of Units`==1)%>%filter(grepl("99 yrs", Tenure, ignore.case =TRUE))
3.2 Revising Plot I - Distribution of Unit Price ($ PSF) across Property Types
Adding new category of prop_clust to label property type into “Landed” and “High Rise”.
Enforcing factoring to reorder the Property Type to “Executive Condos”,“Apartment”, “Condo”, “Terrace”, “Semi- Detached”,“Detached”.
Introducing fixed color scheme of blue for “High Rise” and orange for “Landed”
Calculating key statistics such as “Median” for subsequent annotation.
Code
## Segmenting into 2 clusters of property type and use facet_grid()resale_data <- resale_data %>%mutate(prop_clust =case_when(`Property Type`%in%c("Detached", "Semi- Detached", "Terrace") ~"Landed", `Property Type`%in%c("Condo", "Apartment", "Executive Condos") ~"High rise"))## Introducing factors to reorder the recordsprop_order <-c("Executive Condos","Apartment", "Condo", "Terrace", "Semi- Detached","Detached")resale_data$`Property Type`<-factor(resale_data$`Property Type`, levels = prop_order)## Introducing fill color to segment the different property typesprop_col <-c("Apartment"="#CCE5FF", "Executive Condos"="#99FFFF", "Condo"="#99CCff", "Terrace"="#FFE5CC", "Semi- Detached"="#FFCC99", "Detached"="#FF9933")median_prices <- resale_data %>%group_by(`Property Type`) %>%summarize(Median_Price =median(`Unit Price ($ PSF)`))# Adjusting column header for kable displayresale_data <- resale_data %>%rename("Transacted_Price"="Transacted Price ($)") %>%rename("Area_SQFT"="Area (SQFT)") %>%rename("Unit_Price_PSF"="Unit Price ($ PSF)") %>%rename("Number_of_Units"="Number of Units") %>%rename("Project_Name"="Project Name")# Define the plotplot1 <-ggplot(resale_data, aes(x =`Unit_Price_PSF`,y =`Property Type`, fill =`Property Type`)) +geom_rect(aes(ymin =0.5, ymax =3.475, xmin =0, xmax =Inf), fill =NA, color ="#0080FF", linewidth =1, linetype ="solid") +geom_rect(aes(ymin =3.525, ymax =6.5, xmin =0, xmax =Inf), fill =NA, color ="#FF8000", linewidth =1, linetype ="solid") +# Color settingsscale_fill_manual(values = prop_col) +scale_y_discrete(labels =function(y) str_wrap(y, width =5)) +# Geometriesstat_halfeye(adjust =0.5, justification =-0.2, .width =1, point_colour =NA) +geom_boxplot(width =0.2, outlier.shape =NA) +stat_dots(side ="left", justification =1.2, binwidth =25, dotsize =0.02) +geom_text(data = median_prices, aes(x = Median_Price, y =`Property Type`, label =paste0("$", Median_Price)),hjust =-0.4, vjust =0, size =2) +annotate("text", x =Inf, y =2, label ="High Rise", vjust =1.5, fontface ="bold", color ="#0080FF", size =3) +annotate("text", x =Inf, y =5, label ="Landed", vjust =1.5, fontface ="bold", color ="#FF8000", size =3) +scale_x_continuous(breaks = scales::pretty_breaks(n =5), labels = scales::label_number(scale =1), limits =c(0, NA)) +# Coordinate and theme settingscoord_flip() +theme_ipsum(axis_title_size =10, base_size =7, grid ="Y") +theme(plot.margin =margin(10, 10, 10, 10),plot.title =element_text(size =10, hjust =0.3),axis.title.x =element_text(hjust =0.5, size =8),axis.title.y =element_text(hjust =0.5),plot.background =element_rect(fill ="#FFFFFF", colour ="#f5f5f5"),panel.border =element_blank(), legend.position ="top", legend.justification ="center", legend.direction ="horizontal", ) +guides(fill =guide_legend(nrow =1)) +# Labels and annotationslabs(x ="Unit Price ($ PSF)", y =NULL,title ="Distribution of Unit Price ($ PSF) across Property Types") # Display the plotplot1
3.3 Revising Plot II - Distribution of Transacted Price
Understanding the key statistics for the 2 property cluster of “High Rise” and “Landed”.
High Rise
Transacted_Price
Area_SQFT
Unit_Price_PSF
Min
$440,000
355.2
$286
Q1 (25%)
$1,231,120
710.4
$1,343
Median
$1,514,000
968.8
$1,676
Mean
$1,711,129
999.3
$1,767
Q3 (75%)
$1,954,320
1205.6
$2,150
Max
$18,800,000
8697.3
$4,504
Landed
Transacted_Price
Area_SQFT
Unit_Price_PSF
Min
$780,000
1,344
$252
Q1 (25%)
$2,575,000
1,778
$1,027
Median
$3,569,000
2,580
$1,389
Mean
$4,098,191
2,942
$1,470
Q3 (75%)
$4,280,000
3,577
$1,814
Max
$36,500,000
29,202
$4,456
Plotting ridge plot for the “High Rise” and “Landed” with the color codes by quantile.
Limiting the range of `Transacted Price ($)` to visualise the majority of the data points. This is justified as the outliers form a statistically small proportion and is removed for better representation of distribution.
Adjusting height through stat_density_ridges(height = 0.5)
Introducing transparency with stat_density_ridges(alpha = 0.5)
Including space between y-axis scale_y_discrete(expand = expansion(add = 0.1))
3.4 Combining Plot I and Plot II
For High Rise building,
Code
hr_plot <-grid.arrange(plot1a, plot2a, ncol =2) grid.rect(x =0.5, y =0.5, width =1, height =1, just =c("center", "center"), gp =gpar(col ="#0080FF", fill =NA, lwd =2))
Insights from High Rise (Unit Price $PSF vs Transacted Price $)
We observed that “Apartment” typically has the highest price for both Unit Price ($PSF) and Transacted Price ($), followed by “Condo” and lastly “Executive Condos”.
While the unit price of Executive Condo and Apartment is denser on the higher spectrums, the transacted price of these units are
With Transacted Price ($) generally being the function of Unit Price ($PSF) and Area (SQFT) for new launches, working backwards, we can infer that “Apartment” are likely to have smaller unit sizes. (Rough estimation below, further visualization with Area (SQFT) is needed to validate this assumption as figures may be distorted with inflation from resale units)
High Rise Type
Median Transacted Price ($)
Median Unit Price ($PSF)
Estimated Area (SQFT)
Executive Condos
$1,400,000
$1,326
1,055
Apartment
$1,676,855
$2,120.50
790
Condo
$1520,000
$1,631
932
For Landed properties,
Code
l_plot <-grid.arrange(plot1b, plot2b, ncol =2) grid.rect(x =0.5, y =0.5, width =1, height =1, just =c("center", "center"), gp =gpar(col ="#FF8000", fill =NA, lwd =2))
Insights from Landed Properties (Unit Price $PSF vs Transacted Price $)
Interestingly, the unit price of “Terrace” is the highest at $1,416 psf, followed by “Semi-Detached” at $1,263 psf and lastly, “Detached” of lowest at $1,067 psf.
Nevertheless, aligned with common perception, the Transacted Price ($) for “Detached” is the highest, with 75% approximately $14 million, and median price of $ 6.3 million. This is largely attributed to unit size (Area (SQFT)) not included in this visualisation.
3.5 Visualising relationship between Unit Price ($PSF) and Transacted Price ($) with scatter plot
To understand the relationship between Unit Price ($PSF) and Transacted Price ($), the most direct method is to use a scatter plot to identify any correlation. However, extremely high Transacted Price ($) due to exceptional purchases makes the visualisation skewed and majority of the Unit Price ($PSF) and Transacted Price ($) are clustered near the median.
Hence, ranking the transaction by percentile will evenly distribute the points within the scatter plot and identify any patterns.
3.5.1 - Having an end goal in mind.
With Tableau, we plotted a scatter chart and a combined interactive view
3.5.2 Overview of further areas for exploration with ggplotly and ggiraph.
Introducing trend line to show the correlation between Unit Price ($PSF) and Transacted Price ($).
Introducing separate views for each property type to de-clutter the scatter plot with ggplotly.
Introducing interactivity such that the viewer can either:
(a) Select price range of interest and see the list of past projects with ggiraph
(b) Select project of interest by filtering to see the price range of this project. (Test out in the sketch done in Tableau)
3.5.3 - Introducing separate view of scatter plot with ggplotly
To minimise clutter, ggplotly is used to display different views based on the respective property types.